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*vj , We consider the problem of choosing the optimal (in the sense of 

^s^ ■ mean-squared prediction error) multistep predictor for an autoregres- 

sive (AR) process of finite but unknown order. If a working AR model 
(which is possibly misspecified) is adopted for multistep predictions, 
then two competing types of multistep predictors (i.e., plug-in and 
^»> I direct predictors) can be obtained from this model. We provide some 

(~| ■ interesting examples to show that when both plug-in and direct pre- 

dictors are considered, the optimal multistep prediction results can- 
not be guaranteed by correctly identifying the underlying model's 
order. This finding challenges the traditional model (order) selection 
criteria, which usually aim to choose the order of the true model. A 
new prediction selection criterion, which attempts to seek the best 
^ ' combination of the prediction order and the prediction method, is 

CO , proposed to rectify this difficulty. When the underlying model is sta- 

CO ' tionary, the validity of the proposed criterion is justified theoretically. 

\l I To obtain this result, asymptotic properties of accumulated squares 

of multistep prediction errors are investigated. In addition to over- 
coming the above difficulty, some other advantages of the proposed 
criterion are also mentioned. 
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1. Introduction and overview. In recent years there has been growing 
interest in the study of multistep prediction in various time series models 
[e.g., Findley (1984), Xiao and Xu (1993), BhansaU (1996, 1997), Haywood 
and TunniclifFe- Wilson (1997), Hurvich and Tsai (1997), Findley, Potscher 
and Wei (2001, 2003) and Ing (2003), among others]. Through these previ- 
ous efforts, some new parameter estimation, prediction and model selection 
theories related to this research topic have been established. However, the 
problem of how to choose models to minimize multistep mean-squared pre- 
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2 C.-K. ING 

diction error (MSPE) has still not been clarified even for autoregressive (AR) 
processes. This motivated our study. 

To fix ideas, let us assume that observations xi, . . . ,Xn are generated from 
the stationary AR model 

PI 
(1-1) xt+i = ^aiXt+i-i + et+i, 

i=l 

where 1 < pi < oo is unknown, a^^ 7^ 0, the ej's are (unobservable) uncor- 
related random noises with zero mean and common variance a'^, and the 
characteristic polynomial A{z) = 1 — aiz — ■ ■ ■ — ap-^z^'^ has no zeros inside 
or on the unit circle. This last assumption implies that xt-\-i has a one-sided 
infinite moving-average representation 

00 

Xt+l = / bjEt+l-i, 

where 6j = 1 for i = and \hi\ < cqc"'^^* for z > 1 and some positive num- 
bers Co and ci. For later reference we also define the parameter space of 
interest: 

A = {(di, . . . , dpj' : —00 < dj < 00 for 1 < i < pi and 

1 — diz — ■ ■ ■ — dp^z^^ 7^ for any complex number \z\ < 1}. 

To predict Xn+h, h >1, under the situation where pi is unknown, it is 
common to use a working AR model, which is possibly misspecified, to 
replace the true underlying AR{pi) model. Then a natural predictor of Xn+h 
can be obtained by repeatedly using the fitted (by least squares) working 
model with the unknown future values replaced by their own forecasts. In 
the following discussion this predictor is referred to as the plug-in predictor. 
More specifically, let the order of the working AR model be denoted by k and 
let the least-squares estimator of the coefficient vector in the working model 
be denoted by a„(l,fc) = (di^.„(A;), . . . ,afc^„(A;))', where a„(l,A;) satisfies 

1 n—l 

f„(l,fc)a„(l,A;) = Vxj(A;)xj+i 

n — K ^ 



with x(A;) = {xj . . . , Xj^k+i)' and 



1 n—h 



n — h — k + l ^ ■' 

j=k 

Then, for /i > 1 the plug-in predictor can be expressed by 
(1.2) Xn+h{k)=y^n{k)kn{h,k), 
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where a„(/i, k) = A^~-'^(fc)a„(l, k), and with Im and Om, respectively, denot- 
ing an identity matrix and a vector of zeros of dimension m, 

An{k)=lkn{l,k) -^V 

(Note that A'!|^{k) = Ik-) On the other hand, the direct predictor of x^+h, 
Xn+h{k), suggested by Findley (1984), is also frequently used as an alterna- 
tive, where x^+hik) is obtained through a linear least-squares regression of 
xt+h on xt,...,xt-k+i, that is, 

(1.3) Xn+h{k) = x^(A;)a„(/i, k), 

where knih^k) satisfies 

1 n—h 

fn{h,k)kn{h,k) = — — - ^yij{k)xj+h- 

n — h — k + 1 ^ 

Viewing (1.2) and (1.3), it is obvious that the plug-in and direct predictors 
are identical when h = l. For h>2 Ing [(2003), Theorems 1 and 2] showed 
that the plug-in predictor has an advantage over the direct predictor in 
situations where the order of the working model, k, is not less than pi. More 
specifically, as /i > 2 and k>pi, the MSPE of the plug- in predictor, 

MSPEP„,^A;) = E{xn+h - S:n+h{k)f , 
and that of the direct predictor. 



MSPF.Dn,h{k) = E{Xn+h - Xn+h{k)f, 



have the property 



MSPEDn,h(.f^)-a[ 



2 



^ ^ ^ n^^MSPEP„,,(fc)-a2 >'' 

where a^ = o-'^J2jZo ^?- Therefore, Xn+h{k) is asymptotically more efficient 
than Xn+h{k) when k > pi and h>2. For more details, see (2.2)-(2.4) of 
Section 2. Ing (2003) also compared the prediction efficiencies of Xn+h{k) 
and Xn+h{k + 1) and those of Xn+h{k) and Xn+h{k + 1) for k >pi. Under 
certain conditions it was shown in Theorem 3 of Ing (2003) (see also Theorem 
2.3 of Section 2) that 

(1.5) ,.^MSPEP„,.(>. + l)-.g 

and 

(16) MSFED^,,{k + l)-al 

^ ' "™ MSPEZ)„,,(fc)-a2 
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hold for h > 1 and k > pi. Inequalities (1.4)-(1.6) suggest that from the 
MSPE point of view, Xn+hipi) seems to be the optimal choice among two 
competing families of candidate predictors, 

family I = {x„+ft,(l), . . . ,£„+fe(i4r)} 

and 

family II = {x„+,,(l), . . .,Xn+h{K)}, 

where K is known to satisfy K >pi. [Note that we sometimes use {k,l) to 
denote Xn+hik) and use {k,2) to denote Xn+h{k).] Surprisingly, when h>2 
this conjecture is not true, provided (ai, . . . ,ap^)' falls into some nonempty 
subset of A. 

To see this, let us begin with the linear predictor of Xt+h, h>l, based on 
the infinite past, xt-j,j > 0, with the smallest MSPE. Let this predictor be 
denoted by Xt+h- Then we have 

Ph 
xt+h = ^aj{h,ph)xt+i-j, 
j=i 

where ap^{h,ph) ^ and 

{ai{h,ph), . . . , ap^{h,ph)y = a.D{h,Ph) 

with aD{h,k) =T-^{k){jh,---,lh+k-iy, T{k) = E{xi{k)x[{k)) and 7j = 
E{xtXt-j)- We also have 

(1-7) xt+h = xt+h + r]t,h, 

where rjth = J2jZo bj£t+h-j- Model (1.7) is referred to as the h-step predic- 
tion model that corresponds to model (1.1) [note that when h = l, aj{l,pi) = 
aj for j = 1, . . . ,pi]. One notable but often disregarded feature of model (1.7) 
is that when h> 1, ph can be strictly less than pi and vary with h. For ex- 
ample, if pi = 2, then the corresponding two-step prediction model is 

xt+2 = (a? + a2)xt + a2aixt-i + et+2 + a-i^t+i- 

Hence P2 = 1 < Pi if ai = 0. A similar situation also arises in the three-step 
prediction case, provided that of + 02 = 0. This phenomenon can occur even 
if all parameters in the one-step prediction model are large in magnitude. 
This also creates some unexpected difficulties in assessing the performances 
of the plug-in and direct predictors. 

Note that when p/j < pi it seems more interesting to compare the perfor- 
mances of Xn+h{pi) and Xn+hiPh) rather than those of Xn+h{k) and Xn+hik)- 
In Section 2, some interesting examples are given to show that when p^ < pi 
and h>2, 



MSPEn„,(p, 



'h 



"■" ■■'iSo MSPEP„.;.to)-4 ^' 
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can occur. Moreover, since the value of the above hmit depends on un- 
known parameters, it is not possible to determine the rankings of Xn+h{pi) 
and Xn+hiPh) from the point of view of MSPE. This phenomenon further 
leads us to face a fundamental problem while selecting multistep predic- 
tors; that is, instead of the multistep predictor obtained by identifying the 
one-step prediction model's order, can a multistep predictor be constructed 
to minimize the multistep MSPE directly? As mentioned, this problem is 
complicated when both families I and II are considered. In this situation, 
the prediction order and the prediction method must be taken into account 
simultaneously. 

This article aims to resolve the above problem. The strategy adopted 
herein is to find a statistic for each MSPEP„^/i(A;) and MSPEi-*„^/i(A;), k = 
1,...,K, and to show that the ordering of these statistics coincides with 
the ordering of their corresponding multistep MSPEs. To achieve this goal, 
we consider the multistep generalizations of accumulated prediction errors 
(APEs) based on sequential plug-in and direct predictors, namely, 

n—h 

(1.9) AFEPn,hik) = J2 i^i+h - Xi+hik)f 

i=mh 

and 

n—h 



(1.10) AFED^^hik)= Y.i'^i+h-ii+hik))', 



1=171), 



respectively, where rrih denotes the smallest positive number such that aj(/i, K) 
and aj(/i, K) are well defined for all i > rrih. Note that the APE with /i = 1, 
namely, APEP„^i(A;) = APEZ?„j(/c), was first proposed by Rissanen (1986) 
for the purpose of determining pi. Subsequently, the statistical properties of 
APEPn^i(A;) were investigated by Wei (1987, 1992) in stochastic regression 
models, which included model (1.1) as a special case. However, as indicated 
in Section 3, Wei's approach cannot be directly applied to the case of /i > 2. 
Theorems 3.1 and 3.2 (also in Section 3) are devoted to dealing with this 
difficulty. In particular, the results obtained in these theorems show that the 
ordering of the multistep MSPEs of the predictors in families I and II can be 
well preserved by their corresponding multistep APEs when n is sufficiently 
large. Based on this finding, we propose the following predictor selection 
procedure {kn,jn), where 1 <kn< K and 1 < jn < 2 (recall that kn denotes 
the prediction order and j„ denotes the method of prediction) : 

Step 1. Define fc^-*^ = argmini<fc<x APE D„_i (A:). 
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Step 2. Define 



^D I = arg ^ min APE D^^hik) 

' l<k<K 



and define 



A;(^i''^) = arg min APEP„,/,(A;). 

D,n — — 

Step 3. If APEZ)„,^(fcgi,) > APEP,,;,(fei'''')), then {kn3n) = {k^n'''\l); 
otherwise {Kjjn) = (A:)j;^,2). 

We show in Theorem 3.4 of Section 3 that with probability 1, {kn,jn) 
ultimately can choose the best predictor among families I and II regardless 
of whether ph < pi or ph= Pi- This property is referred to as the asymptotic 
efficiency; see Section 3 for the explicit definition. Moreover, pi can also be 

consistently estimated by k\j\^- 

It is worth noting that in this article more than a treatment of the dif- 
ficulty caused by (1.8) is offered: (1) To the author's knowledge, {kn,jn) is 
the first criterion that is designed to choose the optimal multistep predic- 
tor from the "honest" prediction point of view. By honest prediction, we 
mean the prediction for the future of the observed time series; see Rissanen 
(1987, 1989) for details. In the context of time series, most model selection 
criteria for prediction are obtained or justified under the assumption that 
the processes used for estimation and for prediction are independent; see, 
for example, finite prediction error [FPE; Akaike (1969)], Akaike information 
criterion [AIC; Akaike (1974)] and 5'„,(/c) [Shibata (1980)]. However, this type 
of prediction, which differs from Rissanen's idea, does not seem to be natu- 
ral for time series analysis; see also Ing and Wei (2004). Recently, Ing and 
Wei (2004) obtained optimality for honest predictions of AIC (referred to as 
same-realization predictions in their article) in stationary AR(cx3) processes. 
However, because their main concern was the case of one-step predictions, 
they did not deal with the problem of choosing the optimal combination of 
prediction order and prediction method. (2) This article shows that accu- 
mulated squares of sequential prediction errors can be used to choose a good 
predictor even in certain nonstandard situations. The sequential prediction 
error of AVE Pn^hik) with h>2 involves a nonlinear transformation of the 
one-step least-squares estimators. While the sequential prediction error of 
APE Dn^hik) with h>2 is directly obtained from (h-step) least squares, its 
martingale structure no longer exists [see the discussion after (3.6)]. These 
nonstandard situations, which are not encountered with the one-step APE, 
challenge the validity of the multistep generalizations of APE for model 
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(predictor) selection. By establishing the asymptotic efficiency of {kn,jn), 
we clarify this ambiguity. 

This article is organized as follows. In Section 2, some preliminary results 
from Ing (2003) and some examples that motivated this work are introduced. 
The asymptotic efficiency of (A;„,j„) is established in Section 3. In Section 
4, an extension of the proposed criterion to subset autoregressions is given. 
Concluding remarks are given in Section 5. Some technical results, which 
are useful for obtaining the APEP„^ft,(A;) asymptotic expression with k>pi 
are provided in the Appendix. 

2. Preliminary results and motivating examples. Throughout this sec- 
tion, it is assumed that in model (1.1) the e^'s are i.i.d. random variables 
with mean and variance o"^ > 0. We also assume that the distribution 
function of ei, F{-), has the property, for some positive numbers a, ij and 
M, 

(2.1) \F{x)-F{y)\<M\x-yr, 

provided |x — y| < rj. Theorems 2.1 and 2.2 provide asymptotic expressions 
for MSPF, Pn^hik) and MSPFj D^^hik) with k >pi, respectively. Their proofs 
can be found in Theorems 1 and 2 of Ing (2003). 

Theorem 2.1. Assume that {xt} satisfies model (1.1). Also assum,e that 
{et} satisfies (2.1) and 

E{\eif'^) <oo, 

where 6^ = max{8, 2(/i + 1)} + 5 for some 5 > 0. Then, for k>pi and h>l, 

(2.2) n(MSPEP„,;,(A:) - al) = fi,h{k) + 0{n~^/^), 

where fi,h{k) =iT{T{k)Lh{k)T-\k)L'^{k))a^ with Lh{k) = Y!;zlhjA^-^-i{k), 

4-1 \ 



A{k) = \siD{l,k) 
and A^{k) = Ik- 



O'fc-i 



Theorem 2.2. Let the assumptions of Theorem, 2.1 hold, with 9^ re- 
placed by 8 + 5 for some 6 > 0. Then, for k > pi^ and h>l, 

(2.3) n{MSFEDn,h{k) - al) = f2,h{k) + 0{n~^/^), 

where f2,h{k) = tr{r~^(A;) cov(^ "q 6jXj(A;))}(T^ and, for a random vector 
y, cov(yj = E{iy - E{y))iy - Eiy))'}. 
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Bhansali [(1997), Proposition 3.2] showed that for k>pi>l and h>2, 

(2.4) M^ > 1. 

Therefore, ii k > pi > 1 and h >2, then Xn+h{k) is asymptoticaUy more 
efficient than Xn+hi^) hi the sense of (1.4). For example, assume h = 2 and 
k>pi>l. Then 

(2.5) f2,2{k) = {k+ik + 2)al}a\ 
and 

(2.6) /i,2(A:) = {{k + 2)al + k-l + a\}a'^ . 

(Note that |apj | < 1 and Ofc = for /c > pi.) Hence, for k>pi, 

^.^^ MSFE D^4k)-al ^_ 1 - aj 

n-ooMSPEP„,2(A;)-a2 ^k + 2)al + k-l + al 

The fohowing theorem shows that fi^h{k) and f2,h{k) with k>pi are 
strictly increasing functions of k. 

Theorem 2.3. (i) Assume h>l and k >pi. Then 

(2.7) Mtlll > 1, 

provided 

(2.8) bh-i^O 
or 

(2.9) IVOfc+i, 

where with the convention that bj = for j <0, 1* = {J2i=o bh-i~k~ibi, ■ ■ ■ , 
J2i=o bh-i-ih)' is a [k + 1)- dimensional vector. 
(ii) Assume h>l and k>pi. Then 

2.10 ^^f ,7, ' > 1. 

j2,hW 

Remark 1. A proof of Theorem 2.3 can be found in Ing [(2003), Theo- 
rem 3]. When 1 < /i < 5, it can be shown that either (2.8) or (2.9) holds for 
all k>pi, and hence (2.7) holds without extra constraints on the parameter 
space. However, for general h (especially when h^k), we are not able to 
establish (2.7) without conditions (2.8) or (2.9). For more details on these 
conditions, see Ing [(2003), Remark 2]. 
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As immediate consequences of Theorems 2.1-2.3, we obtain (1.5) and 
(1.6). Inequalities (1.4)--(1.6) seem to suggest that 



"<^ E{xn+h - Xn+h{k)Y - crl 



(2.11) Jim^ TJ'^ 7nZ. 2 < 1. 



where Xn+h{k) is any predictor in family I or II. However, as indicated by 
Remark 1, when h is large, (2.7) cannot be guaranteed without (2.8) or 
(2.9). Therefore, it is not clear whether (2.11) still holds in the situation 
where both (2.8) and (2.9) are violated. Moreover, we will show that (2.11) 
can fail when ph <pi- To see this, let us begin with a simple extension of 
Theorem 2.2, which provides an asymptotic expression for MSPEL)„^/i(A;) 
with k>ph. 

Corollary 2.4. Let the assumptions of Theorem 2.2 hold. Then (2.3) 
holds with k>ph and h>l. 

Since Corollary 2.4 can be shown by an argument similar to that used 
to show Theorem 2.2, we omit the details. When ph <pi, it would be more 
interesting to compare 

hm (MSPEPn^hiPi) - crl) and lim {MSPE D^^hiPh) - crl) 

n — ^oo n — >oo 

rather than 

lim (MSPEP„,;,(A:) - al) and liin (MSPED„,;,(fc) - al). 

The following two examples show that the advantage of the plug-in predictor 
can vanish in this kind of comparison. 

Example 1 . Let h = 2 and p2 <pi- Then we see that 6i = oi = and 
P2 = Pi — 1- This fact and Corollary 2.4 yield that f2,2{pi) — f2,2{P2) = cr'^- On 
the other hand, by (2.5) and (2.6) we have f2,2{pi) — fi,2{pi) = (1 — "pi)'^^- 
Therefore, fi,2{pi) — f2,2{P2) = flp^cr^ > 0. As a result, we have, for pi—p2 = 



y MSPEA.,2(P2)-CT2 _ /2,2(P2) 
n^MSPEP„,2(m)-CTi /l,2(pi) 

and hence Xn+2{.P2) is asymptotically more efficient than Xn+2(.Pi) in this 
case. 

For general /i, the ratio of f2,h{Ph)/ fi,h{pi) can be larger or smaller than 
1, as shown in the following example. 
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Example 2. First assume that pi = 2 and /i = 3. By (2.5), (2.6) and the 
fact that when k>pi, 

f2,h+i{k) - h,h+i{k) = f2,h{k) - fi,h{k) + e',Lf,{k)T-\k)L'^{k)e,a^ 

[see Section 2 of Ing (2003)], where e'f. = (1, 0, . . . , 0) is a fc-dimensional vec- 
tor, 

/2,3(2) - /i,3(2) = (1 - al)a^ + e'2L2{2)T~^{2)L'2{2)e2a\ 

Some algebraic manipulations yield e'2L2{2)T~^{2)L2{2)e2cr'^ = (1 + a2)(l — 
a2 — 4a^a2)(T^. Therefore 

(2.12) /2,3(2) - /i,3(2) = 2(1 + a2)(l - aa - 20^02)^^. 

Note that Bhansah [(1997), page 442] indicated that /2,3(2) - /i,3(2) = (1 + 
02) (1 — 02 — 2a^02)o"^. However, one can see that the leading constant 2 on 
the right-hand side of (2.12) is needed by examining a simple example which 
assumes that — 1 < ai < 1 and 02 = 0. 

Now, assume 62 = a? + 02 = 0. Thenp3 = 1 < 2 = pi and, in view of (2.12), 

(2.13) /2,3(2) - /i,3(2) = 2(1 + a2)(l - a2 + 24)a^ 
By Corollary 2.4, 

(2.14) /.,3(l) = ^"/"^+"v ^ 

1-02 

and 

(2.15) /2,3(2) - /2,3(1) = (1 - a2 + Y^)^'. 

According to (2.13)-(2.15), 

/2,3(P3) /2,3(1) l-4a2 + al 



^^■^^^ fi,3{pi) /i,3(2) -4a2 + 2ai-2a3 + 4a4- 

Let the rational function on the right-hand side of (2.16) be denoted by 
5(02) and let the unique solution of the equation 5(02) = 1 with — 1 < 02 < 
be denoted by T. Then it can be shown that T ~ —0.54977, 5^(02) < 1 if 
— 1 < 02 < T and g{a2) > 1 if T < 02 < 0. Therefore, when /i > 3 and ph < pi, 
it is not possible to determine the rankings of Xn+h{Pi) and Xn+h{ph) without 
knowledge of the AR parameters. 

To illustrate the results obtained in Example 2, four AR(2) models, 

(2.17) xt = 0.9xj_i-0.81xi„2 + et, 

(2.18) xt = 0.8xi_i-0.64a;i_2 + et, 

(2.19) xt = 0.6a;t_i - 0.36xt_2 + et 
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and 

(2.20) xt = 0.5xt-i-0.25xt-2 + et, 

are considered in our simulation study, where ej's are independent and iden- 
tically A/'(0,1) distributed. The empirical estimates of (MSPED„^3(1) — 
cr3)/(MSPEi-!„ 3(2) — fTg) for the above four models are obtained based 
on 20,000 replications for n = 150, 300, 500 and 1000. These empirical 
estimates and corresponding limiting values [given by (2.16)] are summa- 
rized in Table 1. One can see from these empirical results that i;„+3(l) 
is more efficient than rE„_|_3(2) for models (2.17) and (2.18), and is less ef- 
ficient than x„+3(2) for the other two models. This conclusion coincides 
with that obtained from (2.16). In addition, the empirical estimates of 
(MSPEL>„,3(1) - cj|)/(MSPEP„,3(2) - a|) are rather close to their corre- 
sponding limiting values even for n = 150. 

As a conclusion, we note that when both the plug-in and direct predic- 
tors are taken into account, the optimal multistep prediction results cannot 
be guaranteed by correctly identifying pi or p^. Hence, a predictor selec- 
tion criterion that directly aims at the minimal MSPE (among those of the 
predictors in families I and II) is called for. 

3. Main results. Since we attempt to choose a candidate predictor among 
families I and II that has having the minimal MSPE (at least for large n), 
the loss functions of the candidate plug-in and direct predictors are defined 
as 

(3.1) L,,(^) = |nli-^^(^SPEP„,,(^)-aI), iip,<k<K, 
I 00, ii k <pi 



and 



I lim n(MSPEZ?„,;,(A;)-(7^), iiph<k<K, 
[6.2) L2,h[k) = < n->^ 

loo, iik<ph, 



Table 1 

Simulation results for 

(MSPED„,3(l)-a|)/(MSPEP„,3(2)-a|) 







Model 






n 


(2.17) 


(2.18) 


(2.19) 


(2.20) 


150 


0.700 


0.891 


1.398 


1.719 


300 


0.688 


0.843 


1.365 


1.782 


500 


0.649 


0.879 


1.365 


1.762 


1000 


0.673 


0.872 


1.379 


1.761 


/2,3(l)//l,3(2) 


0.667 


0.868 


1.382 


1.76 
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respectively, where the existence of the above hniits is ensured by Theo- 
rems 2.1 and 2.2. To ensure the prediction loss due to underspecification is 
much larger than the loss due to overspecification, the loss function values 
of (fc, 1) with k < pi and of {k,2) with k < p^ are set to oo. A predictor 
selection criterion, {kn,jn) with 1 < kn < K and 1 < in ^ 2, is said to be 
asymptotically efficient if 

(3.3) P{{kn,jn) e Ch,K eventually) = 1, 

where 



Ch. 



K 



l{k,j):l<k<K,l<i<2 and 



Therefore, with probability 1 {kn,jn) can ultimately choose a predictor hav- 
ing the minimal loss function value. 

Remark 2. Note that Cfi,K can contain more than one element. To 
see this, assume that h = 3, pi = 2, af + 02 = 0, 02 = T w —0.54977 and 
K >2. (Recall that pa = 1 < pi in this case.) By Theorems 2.1 and 2.3, 
Corollary 2.4 and Remark 1, we have /i,3(A;) < /i,3(A:-|- 1), f2,3{k) < f2,3{k + 
1) and /i,3(fc) < /2,3(fe) for k>2. Moreover, by Example 2, /1,3(2) = /2,3(1). 
As a result there are two elements, namely (1, 2) and (2, 1), in C-^^k- 

The goal of this section is to show that (3.3) is fulfilled by {kn,jn)- We 
assume in this section that {et} in model (1.1) is a martingale difference 
sequence with respect to an increasing sequence of u-fields {J^t}, that is, St 
is ^t -measurable, and £'(et|.7-i_i) = a.s. for all t. We also assume that for 
some a> 2, 

(3.4) E{ej\J='t^i) = a^ and sup£;(|et|"|J^t_i) < 00 a.s. 

t 

Note that for k>pi, 

n~h 

(3.5) M>F.Pn,h{k) = Y. {m^h - x'i{k)U^hik)i8i,il, k) - aB(l, k))f 

i=mh 

and for k>pfi, 

n—h 

(3.6) AFEDn,hik)= J2 {v^,h- <ik)iai{h,k) - aDih,k))} 



Jli,h - Xi{K)i^iXi{n,K) - iXD[ri.,K))' 

t=mh 

where r/j_/j is defined in (1.7) and Lih{k) = I]j=o ^j-^i '' {k), with A^ ■'(/c) 
defined below (1.2). The asymptotic properties of APEP„^/i(/c) = APEL'„^/i(/c) 
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with h = 1 were investigated by Wei (1987, 1992) in stochastic regression 
models. One of the key steps in Wei's analysis is to express the (second-order) 
residual sum of squares of the fitted (by least squares) model in a recursive 
form. His approach, however, cannot be directly applied to the situation 
considered in this article. This is because for APEP„ j,(A;) with h>2 there 
is a random matrix Li^h{k) that lies between 'x.[{k) and (aj(l, k) — a£)(l, fc)), 
and for APED„/i(A;) with h>2 the rightmost component Yl^Zk^j{k)Vj,h 
of the centered estimator 

-I n—h 

ai{h, k) - aoih, k) = ^ _ ^ _ ^_^ ^ r~^(/i, k) ^ Xj{k)r]j^h 

j=k 

is no longer a martingale transformation. Therefore, some new technical 
tools are needed to overcome these difficulties. 

Theorems 3.1 and 3.2 describe the asymptotic behavior of APE, Pn^h{k) 
and APEZ)„/i(A;) in the correctly specified case. 

Theorem 3.1. Assume that {xt} satisfies model (1.1). Also assum^e 
condition (3.4). Then for k>pi and h>l, 

n—h 

(3.7) APEP„,;,(A;)- ^ 772^ = ^2 /i,^(fc) log n + o(logn) a.s. 
Proof. Rewrite the right-hand side of (3.5) as 

n—h n—h 

J2 iv^,hf-2 Y. {x',{k)L,,hik)ia,{l,k)-8iDil,k))}v^,h 

n—h 

+ J2 Wi{k)hhik)ik,il,k) - SiD{l,k))f. 

i=mh 

This and Chow (1965) yield that 

n—h 

AFEPn,hik) - Y. i^i,hf 

i='mh 

(3.8) 

^ ^ n-h 

= J2 {^iik)khik)i^iih k) - eiDil,k))}\l + o(l)) + 0(1) a.s. 
To deal with the right-hand side of (3.8), we first introduce Q^{h,k), where 

/n-h \ ' /n-h \ 

(3.9) Q*n{h,k) = ( ^x,(A;)e,+i j 5V„_,,5( ^ x,(A;)e,+i j 
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with S = T{k)Lh{k)T~\k) and ^^ = (E;=fcX,(A;)x' (A;))"!. 

Following Lai and Wei [(1982), equation (2.16)], we obtain a recursive 
expression for Q*^{h,k)^ 

n-h f /i-1 \']^ 

Qlih,k)+ ^ x:(fc)V-_i5K]x,(A:)e,+i c;' 

i=mi^ K \j=k / } 

(3.10) -^-h 

= Q*m,+h-iih,k)+ Y. A{k)S%^iS^,{k)e-i^^ 

+ I + II + III, 

where 

n~h /^^^ \ 

1 = 2 ^ x',(A;)5'y,_i5 5]x,(A;)e,+i W,+i, 

i=mii \j=k / 

n—h /*~^ \ 

Il = -2J2 (xUfc)SVi_ix,(fc))x:(fc)y,_i5^x,(A:)e,+i Wi+ic-^ 
and 

n—h 

111= - Yl {^iik)S'V,^iMk)fel,c^' 

i=mh 

with a = {l + ^i{k)Vi^i^i{k)). By (3.4), Theorem 2 of Lai and Wei (1985) 
and the martingale strong law of Lai and Wei (1982), we have 

(3.11) lim -V;r^=T(k) a.s., 

n — >oo fi 

which together with (3.4) and an analogy with (2.31) of Wei (1987) yields 

(3.12) Ql{h,k) = o{\ogn) a.s. 

Since Cn = (1 — x^(A;)V^x„(A;))~-^, by Theorem 4 of Lai and Wei (1983) [which 
ensures that lim„^ooX^(fc)HiXn(fc) =0 a.s.], we have 

(3.13) lim c„ = 1 a.s. 

n— >oo 

Now, by (3.4), (3.12), (3.13) and Chow (1965), we can rewrite (3.10) as 

n-h ( /i-l \ ~j 2 

(i+o(i)) Yi^ikn^MY^^'A^ynA 

i=mii K ^j=k J ) 

n—h 

(3.14) =o(logn) + 0(l) + (l + o(l))c72 Y ^[{k)S'Vi-iSx,{k) 

+ I + II + III a.s. 
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Reasoning as in the proof of Lemma 2.1 of Wei (1992), we obtain 

2 n-h 

(3.15) lim Y. <^^)S'yi-iSMk) = h,h{k) a.s. 

n— >oo [off n ^^^ 

It is shown in the Appendix that 

(3.16) I = o(logn) a.s. and II = o(logn) a.s. 

Moreover, by (3.11), Theorem 3 of Lai and Wei (1983), (2.10) and (2.12) of 
Lai and Wei (1982), and an analogy with Lemma 2.1 of Wei (1992), 

III = 0(l) + o( £ |x^(fe)5'V-_ix^(fe)|6f+i) a.s. 

= o(logn) a.s. 
This, together with (3.14)-(3.16), yields 

n-h ( /i-l \ ~j 2 

= o-'^fi,h{k)\ogn + o{\ogn) a.s. 
In view of (3.8) and (3.17) this proof is completed if we can show that 

n—h 

Y, {-K',{k)U,h{k){ki{l,k) - s.D{l,k))f 

i=mh 

n-h ( /i-l \ ~j 2 

(318) = E Y^{k)V.-^S,\Y^,{k)e,^,^ I 

n-h ( /i-l \ ~j 2 

+ o(logn) a.s., 

where Si = VjZiLi^h{k)Vi-i. Since by (3.11) and Theorem 1 of Lai and Wei 
(1983) lim„^oo Sn = S a.s., this fact and (A.l) imply that 

n-h ( /i-l \ ~j 2 

(3.19) ^ x^(A;)y,_i(5i-5)^x,(A:)e,+i =o(logn) a.s. 

i=mfi K ^j=k / / 

Consequently, (3.18) follows from (3.17), (3.19) and the Cauchy-Schwarz 
inequality. D 
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Theorem 3.2. Let the assumptions of Theorem 3.1 hold. Then for k > 
Pfi and h>l, 

n—h 

(3.20) APEA^,h(A:)- 5] r?2^ = a2/2,h(A:)logn + o(logn) a.s. 

i=m.h 

Proof. We only show (3.20) for h = 2, because the result for h > 3 
can be obtained similarly and that for h = 1 was verified in Wei (1992). 
Reasoning as for (3.8), we have, for k>ph, 

n-2 

AFEDn,2ik)- Y. (^^.2)' 

y-^-^^) n-2 f n-2 \ >| 2 

= (l + o(l)) ^ x^(A;)y,_2K:x,(A;)7?,,2 +0(1) a.s. 
Now consider 

/n-2 \ ' /n-2 \ 

Q„(2, k) = lY^ Xi(A;)?7i_2 j K-2 f Y ^i(^)^i,2 1 • 

Following Theorem 1 of Wei (1987) and (3.14), we have 

(3.22) (1 + o(l))T(fe) = Q„.,+i(2, k) - Qni2, k) + B{k) + C{k), 

where 

n-2 f /i-1 \ ^ ^ 

T{k)= ^ x',(A:)y.-i^x,(A;)ry,,2 , 

n-2 



B{k) = J2 <{k)VMk)ill2 



%=mh 



and 



n-2 /i-1 \ 

C{k) = 2 Y ^dk)V.-i[YMkH,2]cr\,2. 

[Notice that by Theorem 3 of Lai and Wei (1983) and (3.11), (3.13) still 
holds with Ph < k <pi.] 

In what follows we deal with Qn{2,k), B{k) and C{k) separately. For 
Qni2,k), by an analogy with Theorem 3 of Wei (1987), 

(3 23) On(2,A;) = o(log( ^ \\^i{k)f + \\aiJCi+i{k)f\\ a.s. 
= o(logn) a.s., 
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where the second equaUty is ensured by (3.11). 
For B{k) we have 

n-2 n-2 

2 
+1 



B{k)= Y. ^,{k)VMk)eU + o^l E <{k)VMk)e', 

n-2 

+ 2ai Y^ Xi{k)ViXi{k)£i+iei+2- 



(3.24) *=""' „_2 *""*" 



i=mh 



According to Theorem 1 of Wei (1987), (3.11), (3.13) and Chow (1965), the 
right-hand side of (3.24) can be further expressed as 



n-2 \ 

/ / /, N-r^ /, Nn2 9 \ . /-, N 

n a.s. 



(3 25) ^'(l + «?)^"log^ + o( E i<ik)VMk)fel+A +o(log 

= cr (1 + ax)^logn + o(logn) a.s. 
Therefore 

(3.26) B{k) = a'^{l + al)k\ogn + o{\ogn) a.s. 
To deal with C{k), we have 

(3.27) \C{k) = D{k) + E{k) + F(fc) + G(A;) + Hik), 
where 

n-2 /i-2 \ 

Dik)= Y Xi(A:)Vi_i(^Xj(A;)?7j,2Jc7ne*+2 + aiei+i), 

n-2 

E{k) = al Y x-(A;)yi_iXi_i(A;)c,"^eiei+i, 

n-2 

F(fe)=ai 5] x:(A;)y,_ix,„i(A:)crie2^,, 

n-2 

^(^) = E x-(A;)yi_iXi_i(A;)c,"^ei+iei+2, 

n-2 

H{k)=ai Y Xiik)Vi-iXi-i{k)c:[^eiei+2- 

i=inh 

By (3.4), (3.13) and Lemma 2(iii) of Lai and Wei (1982), we can show that 

(3.28) D{k) = o( Y Uikm^i(Y.MkH,2)] ) +0(1) a.s. 
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Similarly, 

E{k)=ol Y. (x^(A;)K,_ix,_i(fc))\f ] +0(1) a.s. 

(3 29) / "-2 \ 

= 0^ x^_i(A;)yi_ix,_i(A;)ef +0(1) a.s. 

= o(Iogn) a.s., 

where the second equality is ensured by (3.13) and the Cauchy-Schwarz 
inequality, and the last equality is guaranteed by the same argument used 
to obtain Theorem 1 of Wei (1987). The same reasoning that shows (3.29) 
also gives 

(3.30) G(A;) = o(logn) a.s. 
and 

(3.31) iJ(A:) = o(logri) a.s. 

We now deal with F{k). By an analogy with Lai and Wei (1982) we can 
show that 

^ x^(A:)V-_iXi„i(A:)crie2^, 

i=mh 

n-2 



(3.32) =^^ II x^(/c)l^i_ix,_i(/c)cri 

+ o( ^ |xKA;)Fi„ix,_i(fc)|] +0(1) a.s. 

By an argument similar to that used for showing Lemma 2.1 of Wei (1992), 
the Cauchy-Schwarz inequality and (3.13), we have 

n-2 

J2 ^iik)V,-iXi^iik)cr'=tr{T~\k)Ei{k))logn + oilogn) a.s., 

i=mh 

where Ei{k) = i?(xfc(A;)x^^-^(fc)), and 

n-2 

^ |x-(A;)yi_iXi_i(A;)| =0(logn) a.s. 

These results, (3.32) and the fact that tv{T~^{k)Ei(k)) = ai(l, k) [note that 
ai(l. A:) = ai as k>pi; see Section 1 for the definition of aj{h,k)] together 
imply that 

(3.33) F{k) = aiai{l,k)a logn + o(logn) a.s. 
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In view of (3.27)-(3.31) and (3.33) we have 
C{k) = 2aiai{l, k)a'^ logn 

(3-3^) +o( Y. |xKfc)F.„i[2x,(fc),?,,2]| ] +o(logn) a.s. 

Since 

i=mfi v ^j=k / / 

by the Cauchy-Schwarz inequahty and an argument similar to that used to 
show (3.29), the right-hand side of (3.35) equals 

(3.36) (l + o(l))T(A;) + o(logn) a.s. 
This fact, (3.22), (3.23), (3.26) and (3.34) yield 

(3.37) (l + o(l))r(A;) = {(l + a?)fc + 2aiai(l,A;)}o-2logn + o(logn) a.s. 

According to (2.3), (3.21) and (3.37), (3.20) is obtained if we can show 
that 

n-2 ( /i-2 \ ^ 2 

(3.38) ^ x^(A:)y,_2 Zx,(A;)r/,-2 =T(A:) + o(logn) a.s. 

i=m;j I ^j=k / / 

To show (3.38), first observe that 

i-2 

= x.i{k)Vi-2Y^j{k)r]j^2 + ^'iif^)Vi-2Xi_iik)r]i-i^2 

j=k 

x-(fc)^i-2Xi„i(A;) , '^ 

^i-i{k)Vi^2 2_^Xj[k)Vj,2 



l + x^_,(A;)y,_2X,_i(/c) ^.^^ 

X-(fc)^i-2Xj-l(fc) , 

- l + x^_,(^)y._,x._,(fe)"-^('^)^-^"^-(')'^-^'- 

This fact. Theorem 4 of Lai and Wei (1983), and an argument similar to 
that used to show (3.36) yield 

n-2 / i-2 \ 2 

T{k) = {l + o{l))Y.U{k)V,_2j2''j{k)vj,2] +o(logn) a.s., 

i=mi^ \ j=k / 
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as asserted. D 

Remark 3. Interestingly, it can be seen from Corollary 2.4 and Theo- 
rems 2.1, 2.2, 3.1 and 3.2 that the constant associated with the 1/n term of 
MSPEP„ /i(A;), fi^h{k), appears in the logn term of APEP„;i(/c) and that 
associated with the 1/n term of MSPEZ)„^/i(A;), f2,h{k), appears in the log?i 
term of APEL'n /^(A:). When pi and ph are known, these special features al- 
low determination of the sign of /i,/i(pi) — f2,h{Ph) by comparing the values 
of APEi-*„ /i(pi) and AVEDn^hiPh)- This is because, according to (3.7) and 
(3.20), if /i,h(pi)>/2,h(Ph), then 

(3.39) P(APEP„,;,(pi) > kV-RDn^hiPh) eventually) = 1 
and if fi,h{pi) < f2,h{Ph), then 

(3.40) P(APEP„,^pi) < kV-EDn^hiPh) eventually) = 1. 

Equalities (3.39) and (3.40) show that if /i,/i(pi) 7^ f2,hiPh), then with prob- 
ability 1 the sign of APEP„ /^(pi) — APE Dn^hiPh) ultimately equals the sign 
of fi,h{pi) - f2,h{Ph)- 

Theorem 3.3 below deals with the asymptotic performances of APEP„ ^ik) 
and kPE Dn^hik) in underspecified cases. 

Theorem 3.3. Let the assumptions of Theorem 3.1 hold. Then for 1 < 
k <pi and h>l, 

i[APEP„,,(fc)-x:%?,.) 

(3-41) = {sin{h,pi) - az)(/i, k)yT{pi){sin{h,pi) - az)(/i, k)) 

+ (a(/i, k) - SLoih, k))'r{k){a.{h, k) - az)(/i, k)) 
+ 0(1) a.s., 

where a{h,k) = A^~^{k)aL£){l,k) with A{k) defined after (2.2) and a£){h,k) 
in the first term of the right-hand side viewed as a pi- dimensional vector 
with undefined entries set to zero, and for I <k <ph and h>l, 

1 / n—h \ 



-IkPEDr^Mk)- E ^Ih 



(3.42) ^ \ i=mH 

= {^D{h,Ph) - ^oih, k))'T{ph){a.D{h,ph) - a.D{h, k)) 
+ 0(1) a.s., 

where ao{h,k) in the right-hand side is viewed as a ph- dimensional vector 
with undefined entries set to zero. 
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Proof. Following Hemerly and Davis (1989) [which deals with APE Pn,h{k) 
with h = 1], we have 

n—h 

AFEPn,h{k)= Y. {r]i,h + ^i{pi)isiDih,pi) - ai{h,k))f 

n—h n—h 

(3.43) = Yl Vlh + i^ + oil)) E W^iPl)i^Dih,Pl)-k,ih,k))}' 

+ 0(1) a.s., 

where Sii{h,k) is now viewed as a pi -dimensional vector with undefined en- 
tries set to zero and the second equality is ensured by Chow (1965). Since 
(3.11) ensures that liin.n^oo^n{h,k) = a.{h,k) a.s., we can rewrite (3.43) as 

APE Pn,h{k) = (1 + o(l))(aD(/i,pi) - a{h, k))' 

n—h 

n—h / n—h \ 

+ Y ^lh + o{ Y ^i(Pi)xi(^i)) +0{l) a.s. 

Consequently, (3.41) follows from (3.11) and the fact that 

{sLoiKpi) - a.D(h,k))'T{pi){a.D{h,k) -a.{h,k)) = 0, 

where a£i(/i, k) and a(/i, k) are viewed as pi-dimensional vectors with unde- 
fined entries set to zero. 

Since the proof for (3.42) is similar to that for (3.41), to save space we 
omit the details. D 

Armed with the previous results, we are now in a position to show the 
asymptotic efficiency of {kn,jn)- 

Theorem 3.4. Let the assumptions of Theorem 3.1 hold. Then, for K > 
Pi (knjn) is asymptotically efficient in the sense of (3.3). 

Proof. First note that for k>pi, f2,i{k) = k. Hence Theorem 3.2 yields 
that for k>pi, P{AFEDn,iipi) < APEL>.„,i(fc) eventually) = 1. Since the 
first term on the right-hand side of (3.42) is positive, by Theorems 3.2 and 
3.3 we have for k<pi, P{AFEDn,i{pi) < APEDn,iik) eventually) = 1. As 

a result, A;^, „ =pi +o(l) a.s. This fact and Theorems 3.1-3.3 further ensure 
that 

P{{kn,jn) G Ch,K eventually) = 1, 
as asserted. D 
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Remark 4. In this remark, we consider the problem of choosing ph, h> 
1, under model (1.1). For /i = 1 we have shown in the proof of Theorem 3.4 
that 

(3.44) k^^l=ph + o{l) a.s. 

This motivated us to ask whether (3.44) still holds with h>2. To investigate 
this question first assume ph = Pi (or, equivalently, b^^i ^ 0). By (ii) of 
Theorem 2.3 and Theorems 3.2 and 3.3, this assumption guarantees that 
(3.44) holds with h>2. [In fact, by (i) of Theorem 2.3 and Theorems 3.1 
and 3.3, this assumption also ensures that for h>2, 

lim ky =pi=ph a.s., 

where fcp^ = argmini<fc<j^ APEP„ /i(A;).] However, when h is large and ph < 
k <pi it is very difficult to verify f2,h{k) < f2,h{k + 1), which is an essential 
property for (3.44) with /i > 2 to be true. [Note that (2.10) only ensures that 
f2,hik) < f2,hik + 1) holds with k > pi.] Consequently, with arguments used 
in the present article, (3.44) cannot be guaranteed without extra constraints 
on the parameter space. 

To establish a strongly consistent estimator of ph without constraints 
on the parameter space, we consider the multistep generalization of the 
Bayesian information criterion (BIC), 

BlCnA{k) = logal^^^{k) + ^, 

where ^ > l,c„ — ;■ cx3,c„ = o{n), liminf„^ooCn/(logn) > and ajj ^{k) = 

(l/n)X)"=fc (a^i+h — ^i{k)9in{h,k))'^. When the assumptions of Theorem 3.2 
hold, then arguments similar to those used to show Theorem 3.2 of the 
present study and Theorem 3.6 of Wei (1992) yield that 



kB,n=Ph + o{l) a.s., 



uW 



where A:^„ = argmini</c<i^BIC„,^/i(^). Therefore, the difficulty encountered 
with kj^ „ does not exist for k^ ^ . 

4. An extension to subset autoregressions. When some Oj's with 1 < 
i <Pi — 1 in model (1.1) or some ai{h,pfi)'s with 1 < i < p/i — 1 in model 
(1.7) are zero, a multistep predictor, which is obtained without estimating 
these zero coefficients, can be more efficient than the best predictor among 
families I and II. This motivated us to consider the selection of subset au- 
toregressive models. Several different algorithms are available for choosing 
the one-step prediction model under this more general setting [e.g., McClave 
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(1975) and Haggan and Oyetunji (1984)]. While these algorithms have their 
own advantages, no algorithm has been shown to possess optimal properties 
from the (multistep) MSPE point of view. An algorithm which is modified 
from (kn, jn) is therefore proposed in this section as a remedy. 

To begin with, let 9i = 1 if xt+i-i is included as a regressor variable for 
predicting Xf^^ and let 0j = if xt+i-i is not included. Then the family of 
all (nontrivial) subset autoregressions can be expressed as 

Q = {6 = (01, . . . , Ok) :9i = or 1 for 1 <i < K, and 9i = 1 for at least one z}, 

where K is as defined in Section 1. When model E G is adopted, the cor- 
responding plug-in and direct predictors of Xn+h are denoted by Xn+h{9) 
[or (0,1)] and Xn+h{9) [or (6^,2)], respectively, and the multistep MSPEs 
of Xn+h{0) and Xn+h{d) are denoted by MSPE P^,/, (61) and MSPED„,/,(6'), 
respectively. In addition, we also use APEP„^ft(^) and APEZ)„^/j(6'), respec- 
tively, to denote the multistep APEs based on sequential plug-in and di- 
rect predictors when € is used. Let 6^^' = {6\ ,...,9\^) and 9^"^' = 
(6lf \ . . . ,6*^^^) be members of 6. Then we say 6l(i) < 6^(2) if 6*^ ^ < 6lf ^ for 
alll<i<K and 9^-^^ ^ 6'(2) if 6*^ ^ > 6*$^ ^ for at least one i. Now the modi- 
fied model selection procedure (0.„ jn) with 9n &@ and 1 < jn < 2, is given 
as follows. 

Step 1 . Define ^^,1 = arg minege APE Dn,i (^) • 
Step 2. Define 



9'^^^ = aigmmAFE Dn,h{G) 



eee 



and define 



where Gi = {0 : € G and 9^^\^ < 9}. 



9^^'''^ = aj:g mm APE Pn,h{9), 

3(1) 



Step 3. 
otherwise 



UAFEDn,h{9'hi) > AFEPn,h{9^n '''''), then (4,Jn) 



(^i'''\i); 



) Jn 



(k 



(h) . 



To show the validity of {9n, jn), let us recall models (1.1) and (1.7) again, 
and define 9* = {91,..., 9*^) and 9** = {91*, ...,9*j^), where 9* = 1 if a^ / 
and 61* = if Oi = or i > pi, and 9** = 1 if ai{h,ph) i- and 9** = if 
ai{h,ph) = or i > Ph- Therefore, 9* and 9**, respectively, are the most 
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parsimonious correct models for the plug-in and direct predictors. Follow- 
ing (3.1) and (3.2), the loss functions of Xn+h{G) and Xn+h{G) are defined as 



(4.1) E^^hie) 

and 



oo,°° if 9* < 



(4.2) E2,h(k) 



lim n(MSPED„ h{9) - ah, if 9** < 9, 
oo, if 6*** ^61, 

respectively, where the existence of the above limits is guaranteed by argu- 
ments similar to those used to obtain Theorems 2.1 and 2.2. [Note that we 
also obtain expressions for the above limits like those on the right-hand sides 
of (2.2) and (2.3). However, these expressions are not presented here, since 
they are not needed in the following analysis.] A model selection criterion 
{9n,jn) with On (z@ and 1 < Jn ^ 2 is said to be asymptotically efficient if 

(4.3) P{{9n3n) G Bh^K eventually) = 1, 

where 

Bh,K = ((e,i) : e e, 1 < J < 2 and Ej,h{e) = min Ej,^h{9i))\. 
The main result of this section is stated as follows. 

Theorem 4.1. Let the assumptions of Theorem 3.1 hold. Then {9n,jn) 
is asymptotically efficient in the sense of (4.3). 

Theorem 4.1 can be shown by arguments similar to those used to show 
Theorems 3.1-3.4. To save space, the details are omitted. Theorems 3.4 
and 4.1 yield that for sufficiently large n, the predictor selected by {9n,jn) 
is at least as efficient as the one selected by {kn,jn)- Before leaving this 
section, we note that the main disadvantage of {9n,jn) is its time-consuming 
nature, since it needs to compute the multistep APEs for all possible subset 
autoregressive models and for two different prediction methods. However, 
with the availability of fast computers and efficient recursive formulas the 
computer time needed to complete this task is not expensive, provided K is 
not too large. 

5. Concluding remarks. One of the main purposes of this article was 
to find the optimal multistep predictor in finite-order AR models from the 
honest MSPE point of view. Since both the plug-in and the direct predic- 
tors are considered, it is not possible to achieve this goal by identifying the 
order of the smallest correct model, as discussed in Section 2. To resolve this 
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problem, a new predictor selection procedure, {kn,jn) is proposed. We show 
that for sufficiently large n, {kn,jn) can achieve the above goal by choosing 
the best combination of the prediction order and the prediction method. In 
Section 4 this procedure is extended to the situation where all possible sub- 
set autoregressions are included as candidate models. On the other hand, 
the parameter set where (1.8) occurs has Lebesgue measure zero. So one 
may argue that this is unlikely to occur in practice and, hence, the necessity 
to construct {kn,jn) niay be questioned. In contrast to this criticism, it is 
worth noting that {kn,jn) asymptotically dominates traditional multistep 
prediction procedures, which select the one-step prediction order by certain 
consistent order selection criteria and then forecast Xn+h through the plug- 
in (or direct) method. More precisely, the predictor selected by {kn-,jn) has 
at least the same asymptotic efficiency as those predictors selected by the 
traditional procedures for all points of A and is asymptotically more effi- 
cient than the latter for some nonempty subset of A [since the set where 
(1.8) occurs is nonempty for /i > 2]. Moreover, some other advantages of 
{kn,jn), besides offering a treatment of the case where (1.8) occurs, are also 
emphasized at the end of Section 1. 

The validity of {kn,jn) is justified in the stationary case. It is also believed 
that the predictor chosen by this procedure may also perform well in unstable 
cases. However, since the proofs of Theorems 3.1 and 3.2 (especially Theorem 
3.1) rely highly on stationary assumptions, their extensions to unstable cases 
are not straightforward. Further work is needed to overcome these technical 
difficulties. 

This article assumes that the order of the underlying AR model is fi- 
nite. Hence, the frequently discussed AR(oo) model is excluded. When the 
data are known to be generated from an AR(oo) model, it is common to 
use an AR model of increasing (with n) order to predict future observa- 
tions; see, for example, Shibata (1980), Gerencser (1992), Bhansali (1996) 
and Ing and Wei (2003, 2004). In this situation, Ing and Wei (2004) showed 
that AIC is asymptotically efficient for the honest one-step prediction. On 
the other hand, Ing and Yu (2002) showed that the one-step APE is not 
asymptotically efficient in this situation. To rectify the difficulty of using 
APE in AR(oo) models, Ing and Yu (2002) proposed a modification of 
APE, APE^. Instead of accumulating squares of sequential prediction er- 
rors from stage mi [see (1.9)], APE^ is obtained by accumulating squares 
of sequential prediction errors from stage n6, where < 5 < 1 may de- 
pend on n. Under certain regularity conditions, they showed that APE^ 
is asymptotically efficient in AR(oo) models. Motivated by this result, it is 
expected that an efficient multistep predictor selection criterion can be es- 
tablished in an AR(cx3) model after asymptotic behavior of APEi-*„ /i(fc) 
and of APEL>„ /i(A;), with h > 2 and m/j replaced by n5, < 6 < 1, is 
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clarified under this model. As a final remark, we note that when it is 
a priori unknown whether the order of the underlying AR model is fi- 
nite or infinite, the choice between the original APE and its modification 
(by Ing and Yu) becomes a challenging problem even for one-step predic- 
tions. Can a modification of {kn,jn) be obtained for the optimal multistep 
prediction without order assumptions? This is the subject of ongoing re- 
search. 

APPENDIX 



Proof of (3.16). By (3.11), Theorem 3 of Lai and Wei (1983) and 
Chow (1965), (3.16) is guaranteed by showing that 



(A.l) E 11^^ 



1 



i-1 



i — k^ , 

j=k 



Y,^j{k)ej+i 



O(logn) a.s. 



To obtain (A.l), first observe that the term on the left-hand side of (A.l) 
can be expressed as 



n—h ( /k—1 



j-1 i-1 /fe-1 



hwh-uWjt^.i: i:(T.-n~ 



(A.2) 



i=mf^ K \/=0 

k—1 k — 1 ( n—h 

= EE E 



(i - kf 



^ii +1^72+1 



1 



ji=k J2=k \ c=0 
i-1 i-1 



{i-kf 



7 . 7 . ^ii-c^i2-c£ii+i^j2+i ]^i-i )■ 



1=0 c=0 (.i=mii \^ ' ji=k J2=k 

In view of (A.2), if we can show that 



n—h 



(A.3) y: 



1 



i-1 i-1 



E E ^h-cXj2-c£ji+i£j2+i xi_i = O(logn) a.s. 



for each < / < A; — 1 and < c<k — I, then (A.l) follows. In what follows 
we prove this property only for the case of c = / = 0, because the results for 
other c's and Vs can be obtained similarly. 
Note that 



n—h 

E 



1 



i-1 i-1 



(A.4) 



(A _ U\2 Z^ Z^ ^jl^J2^jl+l^J2+l I ^i 
i=mii \^ ' j^=k J2=k / 

n—h / -I i—1 i — 1 \ 

^ <^* E 72 E E XnXj^ej,+iej2+i 



i=k+l \ ji=kJ2=k 
n—h—1 n—h—1 



X? 



n—h 2^ 

xf 



^ 2^ 2^ ^.n^J2^.n+1^.72 + l [ 7 . ;9 

jl=k J2=k 
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where C* is some positive number and r = max{ji + 1, j2 + 1}. Observe that 



xl ^"i^ f s^ - i-fo\ f sf_^ - {i - l)-to 

Z^ „-2 Z^ 



i^ ^-^ \ i^ I V ii — 1) 



+ 2. i—jy-2 + ^oZ^^ 

i=r ^ ' t=r 

where sf = Y!j=i x], 7o = E{xl), 



^71 



{n-hY 

sl_i - (r - 1)70 
(r-l)2 



"•'■ " ^ (i - 1)2^2 



and 



-Dn,r = 70 X! 



n—h 

2 



i=r 



This and (A. 4) yield 



n—h / 1 j— 1 j— 1 \ 

Z^ I ^ 2^ 2^ ^ii^j2^ii+i^i2+i I ^i 

fA 5) «='s+l ^ ii=kJ2=k / 

^ ' ' n—h~ln~h~l 

ji=k J2=k 

Since 

n—h — ln—h~l -i /n~h~l \ 2 

^ ^ Xjij;j2ej-,+iej2+i^„, = o(l)-| ^ a;jej+i j a.s., 

ii=fc J2=fc \ j=k / 

by Wei [(1987), equation (2.30)] and (3.11), 

n—h—ln—h—l 

(A.6) ^ ^ Xj,Xj2eji+iej2+i^„ = o(logn) a.s. 

h=k J2=k 
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By (3.11), an analogy with Lemma 2.1 of Wei (1992) and Chow (1965), 



n—h—ln~h~l 

/ . / . Xjj^Xj2£ji-\~l£j2+l-Dn,r 
ji=k J2=k 



n-h~l /J2-1 



^^■^^ '''--'-' ^'^-^ ^ ^i_i - ih - 1)70 



J2=fc+1 \ji=k I ^-^2 J 



l'n-h-1 I , J2-l \ 2 

)(logn) + 51 ~ Xl ^ii'^ii+i 



X,- a.s. 



\ ?2 "■"-■■ / ^' 



Exchanging the order of summation, we have 



n—h—\n—h—\ 

'^■'' ^.SM'H — ^(^^^? — 

/ n—h /i—1 \ 1 \ 



^ 51 E^i^i+i 72 a.s., 



^i=fc+l \j=k J 



where the second equahty is ensured by (3.11). Observe that 

Y-V^ yi 

n—h—\n—h—\ n—h -i 

= 2^ Z^ 2;j^Xj2eji+ieJ2+l Z^ ^ 

ji=fc i2=fc i=T- 

n— h— 1 n— ?i 1 

(A.9) = Y. ^M+i E 

n-/i-l /i2-l \ / n-h -, \ 



i2 



— \ / \ I 

J2=fc+1 \il=fc / V«=J2 + 1 / 

/n-h-l / -j^ J2-l \ 2 \ 

O(logn) + 51 ~ 5Z ^h^h+i ^h a.s., 
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where the last equahty follows from an argument similar to that used for 
showing (A. 7). As a result, (A. 8) and (A. 9) yield 

n—h—l n—h—1 

/ J / . ^ji^J2^ji+l^J2+l^n,r 

(A.IO) ''='' ''=^ 



(n-h~l / -. J2-l 
J2=k+1 \-^2 j^^,^ 



X,- a.s. 



Reasoning as for (A. 9), 

n—h~l n—h—1 
y^-^^) /n^h~l / ^ J2~l \2 ^ 

= ° J2 (tY ^ii^Ji+1 ^h +C'(logn) a.s. 
Consequently, (A.S) [and hence (A.l)] follows from (A.4)-(A.7), (A.IO) and 

(A.ii). n 
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